AITopics | diffusion model

Collaborating Authors

diffusion model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

Wu, Jingda, Cai, Changxiao

arXiv.org Machine LearningMay-29-2026

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their statistical efficiency remains limited. Existing theories typically rely on strong regularity assumptions, such as uniformly bounded densities or globally smooth score functions, which fail to capture such intrinsic structures. In this work, we study the sample complexity of diffusion models for learning distributions supported on a union of low-dimensional subspaces. Assuming that the data distribution within each subspace is subgaussian, we show that diffusion models require at most $\widetilde{O}(\varepsilon^{-k \vee 2})$ samples to achieve $\varepsilon$ error in 1-Wasserstein distance, where $k$ is the intrinsic dimension. This near-optimal convergence rate depends only on the intrinsic dimension and significantly improves upon prior theoretical guarantees that suffer from the curse of dimensionality. Notably, our analysis applies to a broad collection of distributions without imposing smoothness, bounded-density, or log-concavity assumptions. Overall, our results show that diffusion models can statistically adapt to intrinsic low-dimensional structure while naturally accommodating multi-modal data, offering a rigorous theoretical justification for their success in complex high-dimensional learning tasks.

artificial intelligence, estimator, machine learning, (12 more...)

arXiv.org Machine Learning

2605.30153

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.54)

Add feedback

Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models

Takeda, Ken, Oizumi, Masafumi, Karakida, Ryo

arXiv.org Machine LearningMay-28-2026

Generative models, including diffusion models, are increasingly used as foundation models and adapted through sequential fine-tuning, making continual learning an essential problem setting. However, continual learning in such generative models remains poorly understood: after a task change, what aspects of the learned distribution are most easily lost, and what replay samples should be prioritized? We address these questions through the modern Hopfield energy. Recent links between modern Hopfield networks (MHNs) and diffusion models allow analyses in MHNs to be transferred to diffusion models. We introduce intrinsic forgetting as an increase in Hopfield energy after the task change. In tractable settings in an MHN, we prove that high-energy, outlier-like samples undergo a larger energy increase than cluster-like samples, implying that samples located in sharp, isolated basins are more forgettable. We further analyze memory replay and show that replay is particularly effective for high-energy samples, enabling an energy-based selection of replay samples. We validate these predictions in experiments on MHNs and two diffusion models under continual-learning settings: Stable Diffusion and a pixel-space DDPM. In these diffusion models, Hopfield energy tracks reconstruction-based forgetting, and replay experiments reveal energy-dependent mitigation of forgetting that is consistent with the MHN analysis.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Machine Learning

2605.27975

Country: Asia > Japan (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Latent Diffusion for Missing Data

Estad, Alberte Heering, Peis, Ignacio, Frellsen, Jes

arXiv.org Machine LearningMay-28-2026

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Machine Learning

2605.28427

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Sampling Data with Chains of Forward-Backward Diffusion Steps

Kang, Hyunmo, Levi, Noam Itzhak, Wegner, Corinna Elena, Korchinski, Daniel J., Wyart, Matthieu

arXiv.org Machine LearningMay-27-2026

Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.

diffusion model, large language model, machine learning, (21 more...)

arXiv.org Machine Learning

2605.27006

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
(2 more...)

Add feedback

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

Liang, Yuchen, Shroff, Ness, Liang, Yingbin

arXiv.org Machine LearningMay-27-2026

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.

diffusion model, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2605.27352

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate

Wei, Lifu, Ren, Yinuo, Shi, Naichen, Lu, Yiping

arXiv.org Machine LearningMay-26-2026

Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's dynamics. Recently, score-based diffusion models have learned global diffusion priors that effectively model (stochastic) dynamics, revealing strong potential for data assimilation. In this paper, we investigate how information from noisy observations can be incorporated to enable continuous correction and refinement of the predicted system state when using a diffusion prior. Motivated by particle filtering methods, we represent the posterior distribution using a set of particles. After receiving noisy observations, the diffusion model is guided using the observation likelihood to steer the generation process toward observation-consistent states. Nevertheless, such guidance does not guarantee sampling from the true posterior. We therefore employ a Sequential Monte Carlo approach over the diffusion trajectory, viewed as a path measure, to reweight and resample particles, thereby correcting the generation process and ensuring convergence toward the desired posterior distribution. This leads to an unbiased particle filtering method that rigorously fuses observational data with diffusion model simulations.

artificial intelligence, machine learning, particle, (16 more...)

arXiv.org Machine Learning

2605.18745

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

Wang, Weixin, Yang, Yu, Deng, Wei, Xu, Pan

arXiv.org Machine LearningMay-26-2026

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled way, but their proposals remain largely tied to the base sampler. Since reward information is mainly used after propagation through particle reweighting and resampling, these methods can require large particle budgets and suffer from weight degeneracy and high-variance estimates. One way to reduce variance and improve particle efficiency is to iteratively learn twisting functions that provide look-ahead guidance, as in twisted SMC. However, existing learnable twisting methods are developed mainly for classical sequential inference and can be unstable when applied to diffusion-based alignment with high-dimensional state spaces and terminal, noisy, or black-box rewards. We propose Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a trust-region framework for learning twisting functions in SMC-based inference-time alignment. Each iteration computes an exact KL-constrained update in path space, which admits a closed-form solution by tempered importance reweighting, and projects this target back to the parameterized twisted family by weighted maximum likelihood. Theoretically, we formalize the value-function interpretation of the optimal twisting function and show that it yields a zero-variance sampler. We prove that the trust-region update follows an escort path toward the target distribution, that the weighted maximum-likelihood update is a forward-KL projection, and that the path reduces residual importance-weight variance. Empirically, TRI-TSMC improves primary alignment objectives on discrete diffusion text generation and text-to-image generation under matched inference-time budgets.

fk-steering, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.25123

Country: Asia > Middle East (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

Cheng, Ziheng, Huang, Yixiao, Zhu, Hanlin, Geng, Haoran, Sojoudi, Somayeh, Malik, Jitendra, Abbeel, Pieter, Guo, Xin

arXiv.org Machine LearningMay-26-2026

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.

artificial intelligence, machine learning, multi-objective learning, (14 more...)

arXiv.org Machine Learning

2605.2521

Country:

North America > United States (0.46)
Europe (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective

Perera, David, Moura, Victor, Santos, Lais Isabelle Alves dos, Haddad, Michel F. C., Figueiredo, Flavio

arXiv.org Machine LearningMay-22-2026

Characterizing precisely the asymptotic generalization error of neural networks using parameters that can be estimated efficiently is a crucial problem in machine learning, which relies heavily on heuristics and practitioners' intuition to make key design choices. In order to mitigate this issue, we introduce the Representation Gap, a metric closely related to the generalization error, but admitting better-behaved asymptotic dynamics. Focusing on equivariant diffusion models and leveraging results from optimal quantization and point-process theory, we derive a precise asymptotic equivalent of the Representation Gap and show that it is governed by a single parameter, the \textit{intrinsic dimension} of the task, which is easy to interpret, efficient to estimate, and can be linked to the equivariances of common neural network architectures. We show that this asymptotic dynamic also extends to a broader range of tasks and training algorithms. Finally, we demonstrate empirically that our asymptotic law and intrinsic dimension estimation are accurate on a wide range of synthetic datasets, where these quantities are known, as well as on more realistic datasets, where we obtain results consistent with the related literature.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2605.21692

Country:

North America (0.46)
Europe > United Kingdom (0.28)
South America > Brazil > Minas Gerais (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

Gourevitch, Samson, Janati, Yazid, Shariatian, Dario, Simsekli, Umut, Moulines, Eric, Xing, Eric P., Durmus, Alain

arXiv.org Machine LearningMay-22-2026

Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through an informed predictor-corrector sampler and improved temperature sampling based on the leave-one-out predictor. We further introduce an absorbing-state reformulation of uniform diffusion that preserves the UDM joint law while decomposing it into masked-diffusion-like sampling operations, with simpler denoising posteriors, carry-over unmasking, and a natural remasking mechanism. On language modeling, leave-one-out parameterizations consistently improve UDM generation, while the absorbing construction matches or surpasses masked diffusion. These results suggest that the empirical gap between masked and uniform diffusion is driven less by the choice of marginals themselves than by parameterization and sampling design. The code and models can be found at https://github.com/samsongourevitch/rev_udm.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2605.22765

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback